AITopics | practical loss-based stepsize adaptation

L4: Practical loss-based stepsize adaptation for deep learning

Neural Information Processing SystemsNov-20-2025, 22:42:26 GMT

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others.

deep learning, name change, practical loss-based stepsize adaptation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Add feedback

L4: Practical loss-based stepsize adaptation for deep learning

Michal Rolinek, Georg Martius

Neural Information Processing SystemsNov-20-2025, 18:36:06 GMT

We propose a stepsize adaptation scheme for stochastic gradient descent.

artificial intelligence, machine learning, optimizer, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: L4: Practical loss-based stepsize adaptation for deep learning

Neural Information Processing SystemsOct-7-2024, 20:45:45 GMT

The paper proposes a scheme for adaptive choice of learning rate for stochastic gradients descent and its variants. The key idea is very simple and easy to implement: given the loss value L at the global minimum, L_min, the idea is to choose learning rate eta, such that the update along the gradient reaches L_min from the current point i.e. solving L(theta-eta*v) L_min in eta, where v is for example dL/dtheta in gradient descent. Finally, to make the adaptive learning rate pessimistic to the possible linearization error, the authors introduce a coefficient alpha, so the effective learning rate used by the optimizer is eta*alpha. The authors empirically show (on badly conditioned regression, MNIST, CIFAR-10, and neural computer) that using such adaptive scheme helps in two ways: 1. the optimization performance is less sensitive to the choice of the coefficient alpha vs the learning rate (in non-adaptive setting), and 2. the optimizer can reduce the loss faster or at worst in equal speed with commonly used optimizers. At the same time, the paper has some shortcomings as admitted by the authors: 1.

deep learning, gradient descent, practical loss-based stepsize adaptation, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.80)

Add feedback

L4: Practical loss-based stepsize adaptation for deep learning

Rolinek, Michal, Martius, Georg

Neural Information Processing SystemsFeb-14-2020, 18:28:00 GMT

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, practical loss-based stepsize adaptation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

L4: Practical loss-based stepsize adaptation for deep learning

Rolinek, Michal, Martius, Georg

Neural Information Processing SystemsDec-31-2018

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others.

artificial intelligence, machine learning, optimizer, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

L4: Practical loss-based stepsize adaptation for deep learning

Rolinek, Michal, Martius, Georg

Neural Information Processing SystemsDec-31-2018

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others.

artificial intelligence, machine learning, optimizer, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.28)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

L4: Practical loss-based stepsize adaptation for deep learning

Rolinek, Michal, Martius, Georg

arXiv.org Machine LearningFeb-21-2018

We propose a stepsize adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. We demonstrate its capabilities by strongly improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost. The performance is validated on multiple architectures including ResNets and the Differential Neural Computer. A prototype implementation as a TensorFlow optimizer is released.

artificial intelligence, learning rate, machine learning, (17 more...)

arXiv.org Machine Learning

1802.05074

Country: Europe > Germany (0.28)

Genre: Research Report (0.82)

Technology: